This weeks #TidyTuesday dataset contains data on R downloads over a single year from RStudio CRAN mirror between October 20, 2017 and October 20, 2018. Source: cran-logs.rstudio.com. The data contains date, time, size, version, operating system and country variables.
library(tidyverse)
library(stringr)
library(knitr)
library(here)
library(hrbrthemes)
library(gghighlight)
library(scales)
library(ggthemr)
library(waffle)
#Set theme as Dust using ggthemr
ggthemr(palette = "dust", layout = "clear", set_theme = TRUE)
#Import data from website
#Rdown <- read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2018-10-30/r_downloads_year.csv")
Rdown <- read_csv(here("R_downloads.csv"))
kable(head(Rdown))
| X1 | X1_6 | X1_5 | X1_4 | X1_3 | X1_2 | X1_1 | date | time | size | version | os | country | ip_id |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 2017-10-23 | 14:29:18 | 78171332 | 3.4.2 | win | ES | 1 |
| 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2017-10-23 | 14:29:22 | 20692638 | 3.4.2 | win | PT | 2 |
| 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2017-10-23 | 14:29:57 | 972075 | 3.4.2 | win | PL | 3 |
| 4 | 4 | 4 | 4 | 4 | 4 | 4 | 2017-10-23 | 14:30:00 | 1032203 | 3.0.3 | win | JP | 4 |
| 5 | 5 | 5 | 5 | 5 | 5 | 5 | 2017-10-23 | 14:30:18 | 78171332 | 3.4.2 | win | CN | 5 |
| 6 | 6 | 6 | 6 | 6 | 6 | 6 | 2017-10-23 | 14:30:50 | 64228612 | 3.4.2 | osx | US | 6 |
#Limit to top 20 countries
Rdown_20 <- Rdown %>% group_by(country) %>% summarise(country_n = n()) %>% top_n(country_n,n=20)
bar <- ggplot(Rdown_20, aes(reorder(country, country_n), y = country_n)) + geom_bar(stat = "identity") + coord_flip() +
scale_y_continuous(labels = comma_format()) + labs(
x = " ",
y = "Number of downloads",
title = "Top 20 countries downloading R between October 20, 2017 and October 20, 2018",
caption = "Source: cran-logs.rstudio.com")
bar
#Summarise downloads by operating system
os <- Rdown %>% group_by(os) %>% summarise(os_n = n()) %>% mutate(percent = (os_n/sum(os_n))*100)
os_percent <- c(`OSX` = 21, `Windows` = 74, `Src` = 5)
#Make a waffle chart
waffle <- waffle(os_percent, rows=8, colors = c("#969696", "#1879bf", "#db735c"), legend_pos = "bottom", xlab = NULL, title = "R downloads (%) by operating system\nOctober 20, 2017 and October 20, 2018")
waffle
Rversion <- Rdown %>% group_by(version) %>% summarise(version_n = n()) %>% top_n(version_n,n=10)
bar2 <-
ggplot(Rversion, aes(x = reorder(version, version_n), y = version_n)) + geom_bar(stat = "identity") + coord_flip() +
scale_y_continuous(labels = comma_format()) + labs(
x = " ",
y = "Number of downloads",
title = "Top 10 R versions downloaded between October 20, 2017 and October 20, 2018",
caption = "Source: cran-logs.rstudio.com"
)
bar2
#Use lubridate to extract month and day?
library(lubridate)
Rdown2 <- Rdown %>% mutate(mth = month(date)) %>% mutate(day = day(date))
Rdown_date <- Rdown2 %>% group_by(mth, day) %>% summarise(n()) %>% rename(count = `n()`)
hm <- ggplot(data = Rdown_date, aes(x = mth, y = day)) +
geom_tile(aes(fill = count))
hm
library(plotly)
Rdown_date2 <- Rdown2 %>% group_by(date) %>% summarise(n()) %>% rename(count = `n()`)
line <- ggplot(Rdown_date2, aes(date, count)) + geom_line() + geom_smooth(se=FALSE, color = "steel blue") +
scale_y_continuous(labels = comma_format()) + labs(
x = " ",
y = " ",
title = "R downloads between October 20, 2017 and October 20, 2018",
caption = "Source: cran-logs.rstudio.com"
)
line
#Make an interactive version of the line chart
ggplotly(line)
Between October 2017 and October 2018:
Downloads of R increased substantially in August 2018. Version 3.5.1 was released on 2 July 2018.
Carson Sievert, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec and Pedro Despouy (2017). plotly: Create Interactive Web Graphics via ‘plotly.js’. R package version 4.7.1. https://CRAN.R-project.org/package=plotly
Bob Rudis (2018). hrbrthemes: Additional Themes, Theme Components and Utilities for ‘ggplot2’. R package version 0.5.0. https://CRAN.R-project.org/package=hrbrthemes
Hadley Wickham (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse
Hadley Wickham (2018). stringr: Simple, Consistent Wrappers for Common String Operations. R package version 1.3.0. https://CRAN.R-project.org/package=stringr
Yihui Xie (2018). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.20.
Bob Rudis and Dave Gandy (2017). waffle: Create Waffle Chart Visualizations in R. R package version 0.7.0. https://CRAN.R-project.org/package=waffle
Ciaran Tobin (NA). ggthemr: Themes for ggplot2. R package version 1.1.0.